Search CORE

DukeSpace

Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model

Author: A Dempster
AT Weeraratna
D Porter
D Porter
DA Porter
F Leisch
F van Ruissen
G Schwarz
GJ McLachlan
H Akaike
H Matsumura
HH Thygesen
J Lu
K Boon
KA Baggerly
KA Baggerly
M Cornelissen
R Development Core Team
R Edgar
RZ Vencio
S Lee
S Saha
Scott D Zuyderduyn
VA Kuznetsov
VE Velculescu
WN Venables
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of biological replicates) have additional variance that cannot be accommodated by this assumption alone. Several models have been proposed to account for this effect, all of which utilize a continuous prior distribution to explain the excess variance. Here, a Poisson mixture model, which assumes excess variability arises from sampling a mixture of distinct components, is proposed and the merits of this model are discussed and evaluated. Results The goodness of fit of the Poisson mixture model on 15 sets of biological SAGE replicates is compared to the previously proposed hierarchical gamma-Poisson (negative binomial) model, and a substantial improvement is seen. In further support of the mixture model, there is observed: 1) an increase in the number of mixture components needed to fit the expression of tags representing more than one transcript; and 2) a tendency for components to cluster libraries into the same groups. A confidence score is presented that can identify tags that are differentially expressed between groups of SAGE libraries. Several examples where this test outperforms those previously proposed are highlighted. Conclusion The Poisson mixture model performs well as a) a method to represent SAGE data from biological replicates, and b) a basis to assign significance when testing for differential expression between multiple groups of replicates. Code for the R statistical software package is included to assist investigators in applying this model to their own data.</p

Public Library of Science (PLOS)

Multicentric validation of proteomic biomarkers in urine specific for diabetic nephropathy

Background: Urine proteome analysis is rapidly emerging as a tool for diagnosis and prognosis in disease states. For diagnosis of diabetic nephropathy (DN), urinary proteome analysis was successfully applied in a pilot study. The validity of the previously established proteomic biomarkers with respect to the diagnostic and prognostic potential was assessed on a separate set of patients recruited at three different European centers. In this case-control study of 148 Caucasian patients with diabetes mellitus type 2 and duration >= 5 years, cases of DN were defined as albuminuria >300 mg/d and diabetic retinopathy (n = 66). Controls were matched for gender and diabetes duration (n = 82). Methodology/Principal Findings: Proteome analysis was performed blinded using high-resolution capillary electrophoresis coupled with mass spectrometry (CE-MS). Data were evaluated employing the previously developed model for DN. Upon unblinding, the model for DN showed 93.8% sensitivity and 91.4% specificity, with an AUC of 0.948 (95% CI 0.898-0.978). Of 65 previously identified peptides, 60 were significantly different between cases and controls of this study. In <10% of cases and controls classification by proteome analysis not entirely resulted in the expected clinical outcome. Analysis of patient's subsequent clinical course revealed later progression to DN in some of the false positive classified DN control patients. Conclusions: These data provide the first independent confirmation that profiling of the urinary proteome by CE-MS can adequately identify subjects with DN, supporting the generalizability of this approach. The data further establish urinary collagen fragments as biomarkers for diabetes-induced renal damage that may serve as earlier and more specific biomarkers than the currently used urinary albumin

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Enlighten

Blasted Cell Line Names

Author: Ang K. Kian
Baggerly Keith A.
Byers Lauren A.
Coombes Kevin R.
Girard Luc
Giri Uma
Heymach John V.
Liu Wenbin
Minna John D.
Myers Jeffrey N.
Shen Li
Story Michael D.
Wang Jing
Yordy John S.
Publication venue: Libertas Academica
Publication date: 01/10/2010
Field of study

arXiv.org e-Print Archive

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

Author: A Beyer
AL Barabási
BH Good
BW Kernighan
CO Daub
D Duewer
D Marbach
DFT Veiga
E Bonnet
E Ravasz
E Segal
EH Davidson
F Luo
G Balázsi
G Getz
G Palla
G Palla
H Zare
HW Ma
J Chen
J Duch
J Hubble
J Lemke
J Reichardt
JJ Faith
JJ Faith
JN Weinstein
K Baggerly
Kevin E. Bassler
KY Yeung
M Blatt
M Riley
MB Eisen
MEJ Newman
MEJ Newman
MF Traxler
MM Barker
N Friedman
N Friedman
O Alter
PD Karp
Q Lu
R Guimerà
RA Irizarry
S Fortunato
S Fortunato
S Gama-Castro
S Raychaudhuri
S Tavazoie
Santiago Treviño
Satoru Miyano
SB Seidman
SB Seidman
SP Borgatii
SP Borgatii
TF Cooper
Tim F. Cooper
TS Gardner
U Brandes
UN Raghavan
X Wen
Y Benjamini
Y Sun
Yudong Sun
Z Shi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/01/2012
Field of study

Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect coregulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1 was not uploaded but is available by contacting the author. 27 pages, 5 figures, 15 supplementary file

CiteSeerX

Public Library of Science (PLOS)

FigShare

BRCA2 inhibition enhances cisplatin-mediated alterations in tumor cell proliferation, metabolism, and metastasis

Author: Baggerly Keith A.
Buensuceso Adrian
Chambers Ann F.
Deroo Bonnie J.
Di Cresce Christine
Ferguson Peter J.
Figueredo Rene
Herbrich Shelley M.
Koropatnick James
Leong Hon S.
Maleki Vareki Saman
Romanow Larissa
Rytelewski Mateusz
Shepherd Trevor
Sood Anil K.
Tong Jessica G.
Vincent Mark
Wu Sherry Y.
Publication venue: Scholarship@Western
Publication date: 01/12/2014
Field of study

Tumor cells have unstable genomes relative to non-tumor cells. Decreased DNA integrity resulting from tumor cell instability is important in generating favorable therapeutic indices, and intact DNA repair mediates resistance to therapy. Targeting DNA repair to promote the action of anti-cancer agents is therefore an attractive therapeutic strategy. BRCA2 is involved in homologous recombination repair. BRCA2 defects increase cancer risk but, paradoxically, cancer patients with BRCA2 mutations have better survival rates. We queried TCGA data and found that BRCA2 alterations led to increased survival in patients with ovarian and endometrial cancer. We developed a BRCA2-targeting second-generation antisense oligonucleotide (ASO), which sensitized human lung, ovarian, and breast cancer cells to cisplatin by as much as 60%. BRCA2 ASO treatment overcame acquired cisplatin resistance in head and neck cancer cells, but induced minimal cisplatin sensitivity in non-tumor cells. BRCA2 ASO plus cisplatin reduced respiration as an early event preceding cell death, concurrent with increased glucose uptake without a difference in glycolysis. BRCA2 ASO and cisplatin decreased metastatic frequency invivo by 77%. These results implicate BRCA2 as a regulator of metastatic frequency and cellular metabolic response following cisplatin treatment. BRCA2 ASO, in combination with cisplatin, is a potential therapeutic anti-cancer agent

Scholarship@Western

Clustering-based approaches to SAGE data mining

Author: C Keime
D Porter
F Rioult
Francisco Azuaje
GM Boratyn
H Chen
H Thygesen
H Wang
H Wang
H Wang
H Zheng
Haiying Wang
Huiru Zheng
I Mechaly
J Handl
J Lu
J Sander
J Stollberg
JB Vos
JM Ruijter
K Kim
KA Baggerly
KA Baggerly
L Cai
MA El-Meanawy
MA Gilchrist
MB Eisen
MC Abba
MZ Man
N Bolshakova
P Buckhaults
P Divina
P Tamayo
RT Ng
RZ Vêncio
S Audic
S Blackshaw
S Mclntosh
S Saha
SD Zuyderduyn
T Beißbarth
T Chu
T Kohonen
T Lee
VE Velculescu
VR Akmaev
VR Akmaev
W Chan
W Yasui
WD Patino
X Jin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation

Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments

Author: A Brazma
A Tichopad
AE Pozhitkov
AH Sims
AH Sims
AH Sims
Andrew H Sims
Arthur A Simen
B Langmead
D van der Veen
ES Lander
GK Smyth
J Michael Dixon
J Neter
J Pinheiro
JM Akey
John MS Bartlett
JT Leek
K Baggerly
K Kuhn
KA Baggerly
KD Pruitt
KL Thompson
L Guo
L Mittempergher
L Shi
M Barnes
M Benito
M Lindstrom
MA Mongan
MAQC Consortium
MJ Dunning
N Barbosa-Morais
N Laird
N Novoradovskaya
R Edgar
R Gentleman
R Ihaka
R Owczarzy
RA Verdugo
RD Canales
Robert R Kitchen
RR Kitchen
RR Kitchen
RS Spielman
SO Zakharkin
VG Tusher
Vicky S Sabine
VS Sabine
W Jin
W Shi
W Tong
WE Johnson
WP Kuo
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Systematic processing noise, which includes batch effects, is very common in microarray experiments but is often ignored despite its potential to confound or compromise experimental results. Compromised results are most likely when re-analysing or integrating datasets from public repositories due to the different conditions under which each dataset is generated. To better understand the relative noise-contributions of various factors in experimental-design, we assessed several Illumina and Affymetrix datasets for technical variation between replicate hybridisations of Universal Human Reference (UHRR) and individual or pooled breast-tumour RNA. Results A varying degree of systematic noise was observed in each of the datasets, however in all cases the relative amount of variation between standard control RNA replicates was found to be greatest at earlier points in the sample-preparation workflow. For example, 40.6% of the total variation in reported expressions were attributed to replicate extractions, compared to 13.9% due to amplification/labelling and 10.8% between replicate hybridisations. Deliberate probe-wise batch-correction methods were effective in reducing the magnitude of this variation, although the level of improvement was dependent on the sources of noise included in the model. Systematic noise introduced at the chip, run, and experiment levels of a combined Illumina dataset were found to be highly dependant upon the experimental design. Both UHRR and pools of RNA, which were derived from the samples of interest, modelled technical variation well although the pools were significantly better correlated (4% average improvement) and better emulated the effects of systematic noise, over all probes, than the UHRRs. The effect of this noise was not uniform over all probes, with low GC-content probes found to be more vulnerable to batch variation than probes with a higher GC-content. Conclusions The magnitude of systematic processing noise in a microarray experiment is variable across probes and experiments, however it is generally the case that procedures earlier in the sample-preparation workflow are liable to introduce the most noise. Careful experimental design is important to protect against noise, detailed meta-data should always be provided, and diagnostic procedures should be routinely performed prior to downstream analyses for the detection of bias in microarray studies.</p

Edinburgh Research Explorer

Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework

Author: A Beyer
A Gelman
EH Hurowitz
HH Thygesen
Hong Qin
J Colinge
JS Morris
K Dolinski
KA Baggerly
L Cai
L David
L Zhang
M Harbers
MD Stern
Michael A Gilchrist
Russell Zaretzki
RZN Vencio
RZN Vencio
S Audic
SL Madden
T Beissbarth
VA Kuznetsov
VE Velculescu
VE Velculescu
VR Akmaev
Wolfram Research Inc
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome. Results Using the yeast <it>Saccharomyces cerevisiae </it>as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA. Conclusion With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.</p

University of Tennessee, Knoxville: Trace